Search for: All records

Creators/Authors contains: "Klivans, A"

« Prev Next »

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

The Power of Iterative Filtering for Supervised Learning with (Heavy) Contamination

Klivans, A; Stavropoulos, K; Tian, K; Vasilyan, A (May 2025, https://doi.org/10.48550/arXiv.2505.20177)

Inspired by recent work on learning with distribution shift, we give a general outlier removal algorithm called iterative polynomial filtering and show a number of striking applications for supervised learning with contamination: (1) We show that any function class that can be approximated by low-degree polynomials with respect to a hypercontractive distribution can be efficiently learned under bounded contamination (also known as nasty noise). This is a surprising resolution to a longstanding gap between the complexity of agnostic learning and learning with contamination, as it was widely believed that low-degree approximators only implied tolerance to label noise. (2) For any function class that admits the (stronger) notion of sandwiching approximators, we obtain near-optimal learning guarantees even with respect to heavy additive contamination, where far more than 1/2 of the training set may be added adversarially. Prior related work held only for regression and in a list-decodable setting. (3) We obtain the first efficient algorithms for tolerant testable learning of functions of halfspaces with respect to any fixed log-concave distribution. Even the non-tolerant case for a single halfspace in this setting had remained open. These results significantly advance our understanding of efficient supervised learning under contamination, a setting that has been much less studied than its unsupervised counterpart.
more » « less
Free, publicly-accessible full text available May 26, 2026
Smoothed Analysis for Learning Concepts with Low Intrinsic Dimension

Chandrasekaran, G; Klivans, A; Kontonis, V; Meka, R; Stavropoulos, K (April 2025, https://doi.org/10.48550/arXiv.2407.00966)

In traditional models of supervised learning, the goal of a learner -- given examples from an arbitrary joint distribution on ℝd×{±1} -- is to output a hypothesis that is competitive (to within ϵ) of the best fitting concept from some class. In order to escape strong hardness results for learning even simple concept classes, we introduce a smoothed-analysis framework that requires a learner to compete only with the best classifier that is robust to small random Gaussian perturbation. This subtle change allows us to give a wide array of learning results for any concept that (1) depends on a low-dimensional subspace (aka multi-index model) and (2) has a bounded Gaussian surface area. This class includes functions of halfspaces and (low-dimensional) convex sets, cases that are only known to be learnable in non-smoothed settings with respect to highly structured distributions such as Gaussians. Surprisingly, our analysis also yields new results for traditional non-smoothed frameworks such as learning with margin. In particular, we obtain the first algorithm for agnostically learning intersections of k-halfspaces in time kpoly(logkϵγ) where γ is the margin parameter. Before our work, the best-known runtime was exponential in k (Arriaga and Vempala, 1999).
more » « less
Free, publicly-accessible full text available April 30, 2026
Learning Neural Networks with Distribution Shift: Efficiently Certifiable Guarantees

Chandrasekaran, G; Klivans, A R; Lee, L L; Stavropoulos, K (February 2025, https://doi.org/10.48550/arXiv.2502.16021)

We give the first provably efficient algorithms for learning neural networks with distribution shift. We work in the Testable Learning with Distribution Shift framework (TDS learning) of Klivans et al. (2024), where the learner receives labeled examples from a training distribution and unlabeled examples from a test distribution and must either output a hypothesis with low test error or reject if distribution shift is detected. No assumptions are made on the test distribution. All prior work in TDS learning focuses on classification, while here we must handle the setting of nonconvex regression. Our results apply to real-valued networks with arbitrary Lipschitz activations and work whenever the training distribution has strictly sub-exponential tails. For training distributions that are bounded and hypercontractive, we give a fully polynomial-time algorithm for TDS learning one hidden-layer networks with sigmoid activations. We achieve this by importing classical kernel methods into the TDS framework using data-dependent feature maps and a type of kernel matrix that couples samples from both train and test distributions.
more » « less
Free, publicly-accessible full text available February 22, 2026
Learning the Sherrington-Kirkpatrick Model Even at Low Temperature

Chandrasekaran, G; Klivans, A (November 2024, https://doi.org/10.48550/arXiv.2411.11174)

We consider the fundamental problem of learning the parameters of an undirected graphical model or Markov Random Field (MRF) in the setting where the edge weights are chosen at random. For Ising models, we show that a multiplicative-weight update algorithm due to Klivans and Meka learns the parameters in polynomial time for any inverse temperature β≤logn‾‾‾‾‾√. This immediately yields an algorithm for learning the Sherrington-Kirkpatrick (SK) model beyond the high-temperature regime of β<1. Prior work breaks down at β=1 and requires heavy machinery from statistical physics or functional inequalities. In contrast, our analysis is relatively simple and uses only subgaussian concentration. Our results extend to MRFs of higher order (such as pure p-spin models), where even results in the high-temperature regime were not known.
more » « less
Full Text Available
Learning Constant-Depth Circuits in Malicious Noise Models

Klivans, A; Stavropoulos, K; Vasilyan, A (November 2024, https://doi.org/10.48550/arXiv.2411.03570)

The seminal work of Linial, Mansour, and Nisan gave a quasipolynomial-time algorithm for learning constant-depth circuits (𝖠𝖢0) with respect to the uniform distribution on the hypercube. Extending their algorithm to the setting of malicious noise, where both covariates and labels can be adversarially corrupted, has remained open. Here we achieve such a result, inspired by recent work on learning with distribution shift. Our running time essentially matches their algorithm, which is known to be optimal assuming various cryptographic primitives. Our proof uses a simple outlier-removal method combined with Braverman's theorem for fooling constant-depth circuits. We attain the best possible dependence on the noise rate and succeed in the harshest possible noise model (i.e., contamination or so-called "nasty noise").
more » « less
Full Text Available
Learning Intersections of Halfspaces with Distribution Shift: Improved Algorithms and SQ Lower Bounds

Klivans, A; Stavropoulos, K; Vasilyan, A (May 2024, https://doi.org/10.48550/arXiv.2404.02364)

Recent work of Klivans, Stavropoulos, and Vasilyan initiated the study of testable learning with distribution shift (TDS learning), where a learner is given labeled samples from training distribution , unlabeled samples from test distribution ′, and the goal is to output a classifier with low error on ′ whenever the training samples pass a corresponding test. Their model deviates from all prior work in that no assumptions are made on ′. Instead, the test must accept (with high probability) when the marginals of the training and test distributions are equal. Here we focus on the fundamental case of intersections of halfspaces with respect to Gaussian training distributions and prove a variety of new upper bounds including a 2(k/ϵ)O(1)𝗉𝗈𝗅𝗒(d)-time algorithm for TDS learning intersections of k homogeneous halfspaces to accuracy ϵ (prior work achieved d(k/ϵ)O(1)). We work under the mild assumption that the Gaussian training distribution contains at least an ϵ fraction of both positive and negative examples (ϵ-balanced). We also prove the first set of SQ lower-bounds for any TDS learning problem and show (1) the ϵ-balanced assumption is necessary for 𝗉𝗈𝗅𝗒(d,1/ϵ)-time TDS learning for a single halfspace and (2) a dΩ̃ (log1/ϵ) lower bound for the intersection of two general halfspaces, even with the ϵ-balanced assumption. Our techniques significantly expand the toolkit for TDS learning. We use dimension reduction and coverings to give efficient algorithms for computing a localized version of discrepancy distance, a key metric from the domain adaptation literature.
more » « less
Full Text Available
Testable Learning with Distribution Shift

Klivans, A; Stavropoulos, K; Vasilyan, A (May 2024, https://doi.org/10.48550/arXiv.2311.15142)

We revisit the fundamental problem of learning with distribution shift, in which a learner is given labeled samples from training distribution D, unlabeled samples from test distribution D′ and is asked to output a classifier with low test error. The standard approach in this setting is to bound the loss of a classifier in terms of some notion of distance between D and D′. These distances, however, seem difficult to compute and do not lead to efficient algorithms. We depart from this paradigm and define a new model called testable learning with distribution shift, where we can obtain provably efficient algorithms for certifying the performance of a classifier on a test distribution. In this model, a learner outputs a classifier with low test error whenever samples from D and D′ pass an associated test; moreover, the test must accept if the marginal of D equals the marginal of D′. We give several positive results for learning well-studied concept classes such as halfspaces, intersections of halfspaces, and decision trees when the marginal of D is Gaussian or uniform on {±1}d. Prior to our work, no efficient algorithms for these basic cases were known without strong assumptions on D′. For halfspaces in the realizable case (where there exists a halfspace consistent with both D and D′), we combine a moment-matching approach with ideas from active learning to simulate an efficient oracle for estimating disagreement regions. To extend to the non-realizable setting, we apply recent work from testable (agnostic) learning. More generally, we prove that any function class with low-degree L2-sandwiching polynomial approximators can be learned in our model. We apply constructions from the pseudorandomness literature to obtain the required approximators.
more » « less
Full Text Available
An Efficient Tester-Learner for Halfspaces

Gollakota, A; Klivans, A R; Stavropoulos, K; Vasilyan, A (May 2024, 12th International Conference on Learning Representations (ICLR 2024))

We give the first efficient algorithm for learning halfspaces in the testable learning model recently defined by Rubinfeld and Vasilyan [2022]. In this model, a learner certifies that the accuracy of its output hypothesis is near optimal whenever the training set passes an associated test, and training sets drawn from some target distribution must pass the test. This model is more challenging than distribution-specific agnostic or Massart noise models where the learner is allowed to fail arbitrarily if the distributional assumption does not hold. We consider the setting where the target distribution is the standard Gaussian in dimensions and the label noise is either Massart or adversarial (agnostic). For Massart noise, our tester-learner runs in polynomial time and outputs a hypothesis with (information-theoretically optimal) error (and extends to any fixed strongly log-concave target distribution). For adversarial noise, our tester-learner obtains error in polynomial time. Prior work on testable learning ignores the labels in the training set and checks that the empirical moments of the covariates are close to the moments of the base distribution. Here we develop new tests of independent interest that make critical use of the labels and combine them with the moment-matching approach of Gollakota et al. [2022]. This enables us to implement a testable variant of the algorithm of Diakonikolas et al. [2020a, 2020b] for learning noisy halfspaces using nonconvex SGD.
more » « less
Full Text Available
Evolution-Inspired Loss Functions for Protein Representation Learning

Gong, C; Klivans, A; Loy, J M; Chen, T; Liu, Q; Diaz, D J (April 2024, ICLR 2024)

AI-based frameworks for protein engineering use self-supervised learning (SSL) to obtain representations for downstream biological predictions. The most common training objective for these methods is wildtype accuracy: given a sequence or structure where a wildtype residue has been masked, predict the missing amino acid. Wildtype accuracy, however, does not align with the primary goal of protein engineering, which is to suggest a {\em mutation} rather than to identify what already appears in nature. Here we present Evolutionary Ranking (EvoRank), a training objective that incorporates evolutionary information derived from multiple sequence alignments (MSAs) to learn more diverse protein representations. EvoRank corresponds to ranking amino-acid likelihoods in the probability distribution induced by an MSA. This objective forces models to learn the underlying evolutionary dynamics of a protein. Across a variety of phenotypes and datasets, we demonstrate that EvoRank leads to dramatic improvements in zero-shot performance and can compete with models fine-tuned on experimental data. This is particularly important in protein engineering, where it is expensive to obtain data for fine-tuning.
more » « less
Full Text Available
Predicting a Protein's Stability under a Million Mutations

Ouyang-Zhang, J; Diaz, D J; Klivans, A; Krähenbühl, P (October 2023, https://doi.org/10.48550/arXiv.2310.12979)

Stabilizing proteins is a foundational step in protein engineering. However, the evolutionary pressure of all extant proteins makes identifying the scarce number of mutations that will improve thermodynamic stability challenging. Deep learning has recently emerged as a powerful tool for identifying promising mutations. Existing approaches, however, are computationally expensive, as the number of model inferences scales with the number of mutations queried. Our main contribution is a simple, parallel decoding algorithm. Our Mutate Everything is capable of predicting the effect of all single and double mutations in one forward pass. It is even versatile enough to predict higher-order mutations with minimal computational overhead. We build Mutate Everything on top of ESM2 and AlphaFold, neither of which were trained to predict thermodynamic stability. We trained on the Mega-Scale cDNA proteolysis dataset and achieved state-of-the-art performance on single and higher-order mutations on S669, ProTherm, and ProteinGym datasets.
more » « less
Full Text Available

« Prev Next »